

NASA Technical Memorandum 80090 


NASA-TM-80090 19790014612 


EMULATION APPLIEB TO RELIABILITY ANALYSIS 
OF RECONFIGURABLE. HIGHLY RELIABLE. FAULT 
TOLERANT COMPUTING SYSTEMS FOP, AVIONICS 


GERARD E. MIGNEAULT 


APRIL 1979 


LIBRARY COPY 

AUG 3 19/9 


NASA 

National Aeronautics and 
Space Administration 


Langley Research Center 

Hampton. Virginia 23665 


LANGLEY RESEARCH CENTER 
LISRARY, NASA 
HAMPTON, VIRGINI” 










NF00648 



3-6-1 
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Gerard E. MigiieauU 
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SUMMARY 

This paper proposes that emulation techniques can be a solution to a difficulty arising in the 
analysis of the reliability of highly reliable computer systems for future commercial aircraft, and thus 
should warrant investigation and development. 

The paper first establishes the difficulty, viz., the lack of credible precision in reliability 
estimates obtained by analytical modeling techniques. The difficulty is shown to be an unavoidable 
consequence of: (1) a high reliability requirement so demanding as to make system evaluation by use 

testing infeasible, (2) a complex system design technique, fault tolerance, (3) system reliability 
dominated by errors due to flaws in the system definition, and (4) elaborate analytical modeling tech- 
niques whose precision outputs are quite sensitive to errors of approximation in their input data. 

Next, the technique of emulation is described, indicating how its input is a simple description of 
the logical structure of a system and its output is the consequent behavior. Use of emulation techniqu 
is discussed for ”pseudo-testing'‘ systems to evaluate bounds on the parameter values needed for the 
analytical techniques. 

Finally an illustrative example is presented, albeit for a fanciful small scale application, to 
demonstrate from actual use the promise of the proposed application of emulation. 


I NTRODUCTION 


Research efforts are underway to develop more efficient civil transport aircraft for the future. One 
facet of the effort involves active control technology which implies greater reliance upon computer 
systems in order to obtain maximum benefits. This paper discusses the need and justification of develop- 
ment and investigation of emulation techniques as adjuncts to theoretical reliability analysis models of 
fault tolerant avionic computer systems. 


REQUIREMENT FOR FAULT TOLERANCE 

Designs of fault tolerant computer systems have arisen in response to anticipated needs of future 
civil aircraft (Bjunnan, B. E. et al., 1976), (Hopkins, A. L. et al., 1978), (Wensley, J. H. et al., 1978) 
Requirements for reliability of systems and associated components have been inferred from the expression 
"extremely improbable" in regulatory documentation pertaining to safety in commercial transport aircraft 
(FAA, 1970). The following, variously worded, informal statements indicate the range of interpretations: 

"Thus we have a reliability requirement of 10'^ per hour of operation for a level 1 or level 2 
function with no internal or external backup ..." * (Ratner, R. S. et al . , 1973) 

"... a number less than or equal to 1x10**^ has been imposed ... to represent the probability of 
an event designated as extremely improbable. ... Loss of the CCV/FBW function, given a fault- 
free system at dispatch, shall be extremely improbable." ** (Bjurman, B. E. et al., 1976) 

"... the computer's failure rate will be designed below 10"9 failures per hour in flights of up 
to ten hours duration, with a preferred goal of 10"^0 failures per hour." (Smith, T. B. et al., 
1978) 

"... the extrapolated failure of the design in context with production system application shall 
not exceed 10“^ computer-related system failures in flights up to ten hours." (sic) (NASA, 1978) 

As an average of the Interpretations, and for discussion purposes, an equally informal statement is 
adopted here as the requirement, viz., 

the probability that a system containing no failed components at the start of operation will 
fail during the first ten hours of operation will be less than approximately lO*'^ 

in which the term "failed components" refers, in a conventional manner, to failures caused by physical 
defects occurring randomly in time, and in which a system is' considered to have failed when it has not 
correctly performed tlie function required of it as a subsystem in a larger, encompassing system. 

Temporarily disregarding Tai lures due to causes external to systfmis or to inadequately or incorrectly 
designed and implofnented systems, one can determine that, in order to satisfy the reliability requirement, 
a computer system constructed of devices (in turn constructed of more basic components) with independent 


*Lcvels pertain to criticality of functions. 

**CCV/FBW = control configured vehicle / fly by wire. 
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which •^ystc'iiis Ciiti be constructed, do not havo such lew tli57lTIF'^^^wrrT^ 'wrios. etc . from 
more rcasorublc. Consei)ucntly , cotiiputer systems Intended tn <nVicfw ?• 

designed to tolcr'ate failures. ^ tcided to satisfy the reliability requirement have been 
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utilizing multifunction devi^rsneh ^rnin^elsotn nuh so?tinn f°-' designs 

and/nr"?ntnn!ncting''^-nnnf nndTt!®n«hs'L‘iimi^ n«oSn«f ind°a^gnnHtas!'‘''’ ' 

s!t£''sESI;E<^H€^ 

s;£;.i£ ■£ 

time to failure of fault tolerant svsfpmc tenHc’trt h** reason because the variance of 

tions and constant failure rates one can ihow tL? devices have independent failure distribu- 

from that of its constituent 1?^!!^ 1 ^^? nnnauie'n? f "°5 different 

Figure 1 contains a simplified benaCirmoTl of a^ n.!n 0 LTseerf"'' 

possible configurations having a stated number of ooeratinn ^ ^ state corresponds to a set of 

of a state is the appropriate multiple of the constant fai?uee e^t! f ^f transition rate out 

occurrence of a component failure a *;ijrrp«:cfiii device. Since, given the 

is problematical, so-called "coverage" parameters f- rnnHit state of less redundancy 
given a failure, are included Unsuccessful probabilities of successful transition 

Usually the coverage parameters are associaL^with !Sst™s h!v?no a?ti!r 
gJisha\'l!rop!ra!?ng'1ut^"“^^^ t^nsitions whichTn%Try“™ng'd!sU^ 

(Fener'^r^^igeerandVr"'”'’ appropriate differential e^"a?]ons^!?'thrs?!cLltirpro^'!" '' ' 
!!p!!LnteFbfthl Lprosslo"’"^ * straightforward manner that the probability of sy!Lr?aiL%e is 


o-nXt 
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where aQ = 1 and a 




j=0 


,At 


l)J 


for J = 1. 2 (n-r). 


Of r"'‘!!^nd' X"o'\llTs ra^nST'^'^u" '^^''d for various combinations of values 

is considered. For instance a devic^with HlTF le.c u ^nn''7'dd rate, when the 10'9 requirement 
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for dll 1 / 1, which is excessively poor coverage since systcins are all in zones 9 or higher. In all' 
cases in Ix)th tables, the ratios do not differ from 1 by ati order of magnitude. Mcnce, to the extent that 
fault tolerant systems are represented by r-out-of-n systems, a simple and reasonable approximation to the 
MTTf's ot such systems appears to be simply the MTTF of the "worst" device typo, a far crv from the ton 
billion (10^0) hour value. 

However, having identified a better approximation to MTTF for fault tolerant systems, it is well to 
note that, in the application of interest, the systems will be effectively renewed every ten-hours or so. 
Hence MTTF, in the conventional sense of an unrenewed system used until system failure as computed above, 
is not descriptive of system use. In order to consider the relationship of the reliability requirement to 
safety, it is ntore meaningful to estimate the probability of system failures, to be considered emergency 
situations, during the lifetime of a fleet of aircraft with realistic policies for renewal. Therefore, 
assuming (1) systems meeting the 10"^ requirement when all failure modes are considered, (2) system 
renewal after every ten hours of operation, and (3) a fleet of two thousand (2x10^) aircraft each with a 
lifetime of sixty thousand (6<10^) hours, the probability is approximately 0.01 that one or more emergency 
situations will occur because of a computer system. It is a matter of judgment, no doubt tempered by 
economics, whether or not any greater risk to safety is acceptable. Indeed this estimate does not con- 
sider latent failures, i.e., conditions where physical defects have occurred but have not yet contributed 
to a data error because the failed components have not been party to a computation. Such a mechanism could 
be modeled as an aging effect on the systems — despite periodic renewals -- indicating that the value 0.01 
above is optimistic. And this computation has not included any manner of considering increased complexity 
as r and n, varied. 

Ironically the increased complexity, while ostensibly contributing to a reduction in the incidence of 
system failures resulting from component and device failures, is a source of residual "definitional flaws" 
In systems. The term "definitional flaw" is adopted here to denote an inadvertent system design which, 
when the system is in some particular condition with some unexpected data and regardless of the presence 
or absence of :,onventional component failures or anomalous environments, produces undesirable results which 
could have been avoided by another, proper design; the term includes design errors, specification errors 
or inadequacies, missing requirements, etc. It matters not whether the flaw is in software or hardware or 
is the result of the correct implementation of an erroneous or incomplete specification; the root cause is 
human error. One expects the incidence of such flaws to increase with growth in complexity. There is a 
quite large pool of practical experience with such a failure mode — everyone's 'betes noires*, the soft- 
ware bugs found in operational software systems — which indicates strongly that the failure mode must be 
Included, in some fashion, in the reliability analysis of complex systems. On the other hand, in the 
avionic application of interest, the level of system reliability required effectively precludes the use 
of thorough, lifetime/use testing of actual systems to determine with acceptable confidence (in a 
statistical sense) that the probability of system failure due to residual definitional flaws is compatible 
with the reliability goals and requirement. As a consequence, more analytical methods — for example 
(Costes, A. et al . , 1978) -- must be developed and relied upon to address total system (i.e., logic, 
largely software, and hardware) reliability -- with "acceptable credibility". 


TECHNIQUES FOR ADDRESSING DEFINITIONAL FLAWS 

Analogously to "hardware redundancy", techniques for designing systems with "logical redundancy" to 
(attempt to) prevent system failures attributable to residual definitional flav;s are becoming a subject of 
research -- and development. The software fault tolerance studies at the University of Newcastle-upon-Tyne 
are a leading example of recent innovations (Randell, B., 1975). Largely as a result of the sequential 
nature of software algorithms, fault tolerant software has been oriented more to a method of sequential 
test and selection, in accordance with stated acceptance criteria, from among alternate algorithms in a 
software system, rather than to a method of comparison and voting over the results of a number of alternate 
algorithms. But parallel alternate hardware logic or concurrent alternate software algorithms in parallel 
processors are conceivable mechanizations. The "logical redundancy" techniques are therefore seen to 
parallel hardware. 

Fault tolerant software lends itself to an especially simple behavior model, as in Figure 2(a), on 
the assumption that successful recovery from a software (or logic) failure implies immediate return to the 
initial (software) state. The rationale for the assumption is that the flaw responsible for the software 
data error has always been present in the system, having merely not been previously activated, so to speak; 
the system remains ready to function as before (i.e., correctly) once it has survived the software data 
error. Indeed, one might expect to see a second, identical software error, assuming the initial error 
to have been triggered by unusual data not likely to soon be seen again. (As an aside, experiments using 
the emulation technique to be discussed suggest themselves to determine whether or not software data errors 
might not better be modeled as error "bursts".) Figure 2(b) is a simpler representation of the same 
recovery/failure process. Again, for the sake of simplicity, software is assumed to have a constant 
failure rate, ij, and fault tolerant software is assumed to have an aggregate recovery parameter, 
analogous to the coverage parameters of the r-out-of-n hardvfare model. Immediate system failure is assumed 
to be the result of lack of successful recovery. No further elaboration of a software model is attempted 
since there has been no credible empirical evidence available for the selection and justification of any 
particular, more complex, general model of system failure due to software (Thibodeau, R. , 1978), let alone 
the more general case of residual definitional flaws. 


ANALYTIC RELIABILITY ANALYSIS: HOW CRE DIBLE? 

The software model of Figure 2 and the r-out-of-n model of Figure 1 suffice, however, to show the 
difficulty, when lifetime-use testing of actual systems is not feasible, of establishing with acceptable 
confidence (in the statistical sense) that systems designed to satisfy the 10"^ requirement do achieve the 
reliability goal. In Figure. 3, the two models are combined to represent simply a system subject to and 
tolerant of both hardware component failures and errors due to residual definitional flav/s (here, software). 
An additional assumption is made -- that the software and hardware are ind(?pendcnt -- to keep the 
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jUustiMtion simple again. It is possible to add more complexity in the model, but as stated before, there 
Is no empirical evidence to justify selecting any particular model in prcfL-rence to another. Also, the 
conclusion below is not appreciably modified. Again recognizing the model and assumptions as a Markov 


process, the probability of system failure is computed to be 


1 . e-(-"^-0-0)t g . i)J 


where Bq and aj are as before. 


For a typical (and optimistic) value for X 
and the required value for t (=10 hours), bounds on C 
systeiii to satisfy the 10“^ requirement, are calculated 


failures per hour), typical values for n {^3 to 
j, C 2 • and u(l - k) required, in order for the 
to be as follows: 


5) 


1 > Cj > 0.999y99 
1 ^ C 2 > 0,9999 
p(l - k) < lO'^O 



HORE COMPLEX MODELS 

In the process of investigating fault tolerant systems (previously, principally studies of hardware) 
numerous models have been developed for analyzing the reliability of such systems. Of late. Investigations 
have also been undertaken into models to relate the system failure modes to time-variable computational and 
performance requirements, thus attaching the reliability of a system more tightly to its application 
(Meyer, J., 1977), (Beaudry, M. D. , 1978). Some model evaluation schemes have been "computerized" to 
serve as more or less general purpose tools for the convenient analysis, in the architectural design stage. 
Of systems composed of complex arrangements of elements, e.g., CAST (Cohn. R. B. et al . 19741 CARF TI 
(Stiffler J.. 1974) CARSRA (Bjuman, B. E. et-al., 1976). ARIES (Ng. Y.! 1976). AUhouSh 
details of syst^ behavior such as recovery (detection. Isolation, reconfiguration) strategies, sparing 
(active, stand-by, switching) strategies, transient and intermittent fault (duration, periodicity, leakage) 
inodes, functional dependence among devices, nonexponential failure distributions, etc., the models still 
are constructed essentially from parametric descriptions of aggregate system, subsystem a.nd/or device 
behavior in order to make use of mathematical techniques applicable to idealized stochastic process models 
and for reasonably efficient computation. Hence all the models must be provided with parameter values 
Which need to be assumed or known, by some other means. In order to precisely represent any and each 
particular system design of interest. ^ 


rmiLAtlON 


Digital Simulation 

While the word "simulation" is widely used to denote all manner of techniques for, among other 
^rposes, analyzing the behavior of objects and their environments by means of implementation and manipu- 
lation of more conveniently malleable surrogates, here the viord is limited to mean the use of computer 
systems as surrogates — at whatever level of abstraction is meaningful to an application. The concept 
of system is stressed because usefulness of a simulation scheme depends upon both software and hardware -- 
a characteristic more effectively utilized by emulation. For example, consider the reliability analysis 
programs previously mentioned - CAST. etc. Although they are essentially simulation schemes which are 
normally discussed without regard to host computer hardware, in any actual application, host computer 
hardware will be an important constraint upon the amount of detail which it will be feasible to consider 
With the programs. 


Digital simulation, as opposed to emulation, at the level of gate logic has been discussed in the 
literature on computers and considered as a tool for design and fault (signature) analyses of digital logic 
circuits at levels of detail ranging from simple (e.g., assuming gates to have only two possible output 
values) to complex (e.g., allowing undefined values of gate outputs and various timing anomalies) 

(Szygen a, S. and Thompson, E., 1976). For the analysis of circuits the sizes of microprocessors, memories 
and larger, in practice simulation techniques at the aggregate, functional behavior level begin to displace 
(Mcnon, P. and Chappell, S., 1977) as the gate level simulation costs becoine pro- 
hibitive v/hen compared to perceived benefits. 


However, for the purposes of reliability analysis of fault tolerant systems, gate level simulation 
warrants considerable cost in view of the conclusion to be drawn from the preceding paragraphs that, at 
the levels of reliability of interest, the prubability of failure of such systems is less dependent upon 
the mode of failure resulting from depletion of redundant resources than it Is upon the less well under- 
stood and questionably modeled modes considered under the terms "coverage" and "definitional flaws". A 
similar conclusion to the effect "that the introduction of a redundancy at the hardware level increases the 
relative influence of software faults" is made clscv/here (Costes, A., 1978). Unfortunately, while the costs 
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could bo suffered, in 1 Jyht of the benefits, n«»te simuKition is not a feasible technique for appli- 
cation to questions involviny chance events and repeated trials because it is time consuniinrj -- orders of 
magnitude slower than likely target systems. 


Emulation vs. Simulation 

In ordinary use, the word •’emulation" means an endeavor to equal or excel; in the present context, it 
is reserved for a particular technique of implanenting simulation possible when a host computer is micro- 
programmable. In order to avoid confusion, "simulation" acquires the added meaning here of being distinct 
from "efuulation". Microprograiiining is significant because it allows a final definition of a computer’s 
“apparent" instruction set to be postponed until after the definition of hardwired logic is completed, 
and it does this with an acceptably small risk that the hardware logic will need redesign. This happens 
because a "real" instruction set i_s defined by the hardv/ired logic, is at a quite primitive level, and is 
tailored especially for executing' algorithms which, in turn, become operational definitions of less 
primitive operations -- the "apparent" Instruction set. 

Thus it may be said that a computer* defined by an "apparent" instruction set does not really exist; 

It is “emulated" by microprograinnable hardware by means of microcoded algorithms. Admittedly, variations 
In efficiency of variant microcode operations vis-a-vis various "apparent" instruction sets may exist, but 
they can be Ignored for the present purpose. What is notable is that, given reasonable care not to mis- 
match host and target computers, microprogrammable computers can perform in the role of an "apparent" com- 
puter approximately as efficiently as a hardwired version of the "apparent" computer would. Note that 
“emulation" is at a level of detail which permits software implemented for another, "apparent", target 
computer to be executed "directly" by a host co«nputer. That is, no modification of the target software Is 
needed to make it compatible with the host computer, and no special software on the host computer needs to 
be generated (more specifically, no simulation program in an "apparent" instruction set on the host to 
interpret the instructions of the target software and mimic the target computer) as would be needed on a 
nonmicroprogranmable computer. 


Use as a Diagnostic Tool 

Addition of diagnostic, control functions in the microcode permits a host computer to act not only as 
a surrogate but also as a device for observing and recording (and possibly analyzing) target software per- 
formance in an ostensibly natural environment. Such "diagnostic emulation" use is becoming more common in 
the development and maintenance of special software systems and is, seemingly, "emulation" in the 
dictionary sense. As might be expected efficient use of such a diagnostic system requires support capa- 
bilities for readily modifying microcoded algorithms defining target computers. Such facilities are 
beginning to be developed — for example, EMULA8 (Clausen, B. et al., 1977). What has been less well 
considered is the fact that such capabilities can be extended to permit analysis not only of software but 
also of sysjems (i.e., software ^ hardware) — and not only as they are Intended to be but also as they 
are not. By generating the defining microcode such that it represents target computers in sufficiently 
fine detail combinations of failures in individual components, anomalous data, and def irrltional flaws can 
be Introduced and their effects at the system level observed rather than assumed. Thus emulation provides 
a conveniently manipulated failure effects analysis tool. In addition the manner in which an emulation 
technique is iirplemented, with automated diagnostic and system ^ environment controls, lends itself to 
use for "pseudo-testing" as In Figure 4. 

.In general, emulation can be used to generate repeated trials of "emulated" systems from which 
failure ratios and histograms can be tabulated for analysis -- hence, aggregate behavior models verified 
and parameter values estimated with some measure of confidence (in a statistical sense). Clearly, assump- 
tions about the manners and rates of occurrence of failures and flav/s must still be made in order to intro- 
duce these last Into the emulations. However, while the credibility of precise assumptions will still be 
questionable, it should be possible to develop credibl y pessimistic assumptions to attempt to demonstrate 
that particular fault tolerant system designs exceed the reliability requirement. 

While, also in general, the use of emulation to perform such "pseudo-testing" is limited by the 
efficiency (i.e., computation speed) of the emulation technique and equipment, it appears reasonable to 
state that it is less restricted than in the case of digital simulation. Given the previously described 
need and difficulty of establishing the reliability of the fault tolerant avionic computer systems of 
Interest, emulation techniques merit further investigation. 


SAMPLE EXPERIMENT 
Scope 

An effort of limited scale was undertaken in order to determine whether or not an emulation scheme 
could be devised v/hich v/ould be sufficiently efficient to support analyses of target systems of meaningful 
sizes and complexities, and to demonstrate that such a scheme could be implemented in a manner convenient 
for analysis purposes by users not well versed, if at all, in the emulation scheme itself. As a demon- 
stration, a sample analysis bearing upon reliability of fault tolerant systems was chosen. 

The effort was experimental; time and effort v/ere expended searching out efficient implc<fien tat ions 
and superior microprogrananing capabilities to support the implementations. Consequently no corifnitment to 
any specific microprogranimoblc hardv/are v/as desirable initially. The experiment v/as performed on a large, 
general purpose ewnputer whose underlying microcode was sacrosanct. For this reason emulation was really 
simulated. This last level of cofnpl ication can be accounted for by introducing a time scale factor; it is 
otherwise igriored here. While some variant emulation algorithms v/hich have been conceived have not yet 
been implemented and e/amined, the effort has provided a basis for selecting microprograttinable hardv/are 
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for further studies. Here, however, the experiment is discussed merely to illustroto an actual, rather 
than speculated, appl ication of ouulation to reliability analysis. 


Emulation Techniq ue 

The scheme selected consists of an algoritlim generated iTidependently of any target computer. 
Descriptions of particular systems to be emulated are provided to the algorithm at the time of operation. 
The method is referred to as “table-driven" in contrast to a “compilation" method in which a hardv/are 
description is input to a hardware description language “compiler" which generates a computer program to 
e<nulate one specifically defined computer. The table-driven method was chosen because it was believed to 
facilitate the infusion of failures and to provide better visibility to a user. That is, the target hard- 
ware is visible as a distinct entity at emulation time rather than being dispersed and buried inside the 
workings of an emulation program, and failures and faults can be added and removed without altering the 
cyclic nature of the algorithm. 

From a user's viewpoint, the emulation Is visualized as the repeated transformations of two variables. 
One variable, $n, describes the structure of the system at time step n. The variable is essentially a 
matrix which identifies the interconnections among the logic elements in a system, and also identifies the 
functional behavior of each element. The most primitive element permitted is a generalized gate to which 
constant behavior characteristics (neither correct nor faulty to the emulation algorithm) are attached. 

Korc complex elements such as flip-flops and tristate devices are also permitted, if desired, as primitive 
elements to be manipulated as indivisible entities by the emulation algorithm. (For the experiment, the 
algorithm was limited to elements with scalar output values.) For example, a logic element X might have 
been identified to act as a four (4) input NANO gate driving six (6) other identified elements and supposed 
to have an irregular input-to-output signal propagation time. Hence, is effectively a time-varying, 
annotated logic diagram. 

A second variable, Vp, is a vector containing the output values, at time step ji, of each of the logic 
elements defined in Sp. Target software corresponds to a subset of this variable, viz., those values 
corresponding to logic elements defining some of the emulated system's memory.- 

A third auxiliary variable, Fp, can be visualized as a source of external perturbations into the 
emulated system — affecting Sp, Vp, or both. As currently implemented, this variable is generated 
separately from the others in order to Increase the speed of the emulation computations. It represents 
the source of random failures, flaws, and anomalies at either preselected or random times and control over 
the emulation process. 

The emulation algorithm, a time invariant transformation, is a collection of techniques (so-called 
"selective trace", linked lists, data compression, parallel processing — untested because of the limita- 
tions of the general computers previously mentioned — , event scheduling) consistent with a model of the 
behavior of a "generalized" logic element over an arbitrary time step. 


SAMPLE ANALYSIS: LATENT FAILURES 

The experimental analysis performed was a study of the efficacy of five (5) particular algorithms, 
each with a different instruction mix, as detectors of component "stuck-at" faults (i.e., latent failures) 
In a particular "play" system. The analysis is documented in detail in (Nagel, P., 1978). 

The "play" target computer was originally generated (i.e., defined at the gate logic level) as a 
vehicle for checking out the Initial and modified versions of the emulation algorithms, and for demon- 
strating the ability of support software, a hardware description language translator and meta-assembler 
for regenerating target software, to respond semiautomatically to hardware design changes. The "play" 
computer has a memory of 8192, 16 bit wide words, a CPU with a count of approximately 2000 gate equivalents 
and a single input-output register/port. The logic is arbitrarily assigned to four (4) hypothetical chips: 
a "clock" chip, an "adder" chip, an "op-decode" chip, and a miscellaneous odds and ends chip. The instruc- 
tion set contains about a dozen basic instructions. 

The emulated system trails were simple. The five algorithms, ranging in length from about a dozen 
Instructions to several hundreds, were repeatedly executed, with randomly selected initial data, and 
randomly selected faults of random components. Distributions of time from fault occurrence to fault 
detection (i.e., fault latency duration) were generated. Two analyses of the sort that would be of 
interest in studies of fault tolerant systems were made. For one, the observed distributions were fitted 
against cownonly used mathematical models, e.g., exponentials, as would be done in order to determine 
models and parameter values for use in reliability analysis programs. The results, of course, are not 
significant, owing to the fanciful nature of the input data; still, it is interesting that the distribu- 
tions were best fit by models of balls selected at random from urns. Another result, that the distribu- 
tions each exhibited different nonzero probabilities of never detecting the faults, v/as predictable, but 
only an experiment of this nature could determine the differences in magnitude. A second effort was a 
search for correlations among the distinguishable characteristics of the algorithms and the distributions. 
The only significant correlation found was betv/een instruction mix and detection probability. Here too. 
because of the nature of the target system, the magnitudes of the correlations can only be considered 
fanciful. Cut the concept is useful in considering characteristics which should be avoided in algoritl^ims 
whose function is to reconfigure a system after a failure has been detected. 


CONCLUSION 

A case has been made for the use of wnulation techniques as a needed adjunct to rel iabil ity^analysis 
models for highly reliable avionic compuler systems. Although no conclusion about the technique's 
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eventual u^iefulness is yet warranted, in light of its apparent usofulnoss as a fa ilur(> modes effects 
analysis tool and the promise and potential rewards of its use for probability distribution uses, further 
development and investigation of the technique appears warranted and is being pursued by the NASA. 
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Figure *2 (a) 


— Fai^Lt Tolerant Software -Figure 2(b) 



Figure 3 . r-out-of-n system with fault tolerant softirare 
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FIGURE 4. EMULATION "PSEUDO-TESTING" ACTIVITY 
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